Hybrid transform scheme for video coding

ABSTRACT

An apparatus for decoding a current block from an encoded bitstream includes a memory and a processor. The processor is configured to execute instructions stored in the memory to decode, from the encoded bitstream, a prediction mode of the current block and decode the current block using a transform type selected from a set that includes only a symmetrical discrete sine transform (SDST) and a two-dimensional discrete cosine transform (2D DCT). If the prediction mode is an inter prediction mode, the transform type used is the SDST. If the prediction mode is an intra prediction mode, the transform type used is the 2D DCT.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims priority to and the benefit of U.S. applicationpatent Ser. No. 14/950,024, filed Nov. 24, 2015, the entire disclosureof which is hereby incorporated by reference.

BACKGROUND

Digital video streams typically represent video using a sequence offrames or still images. Each frame can include a number of blocks, whichin turn may contain information describing the value of color,brightness or other attributes for pixels. The amount of data in atypical video stream is large, and transmission and storage of video canuse significant computing or communications resources. Variousapproaches have been proposed to reduce the amount of data in videostreams, including compression and other encoding techniques. Thesetechniques often involve transformation into the frequency domain.

SUMMARY

This disclosure relates in general to encoding and decoding visual data,such as video stream data, using a hybrid transform scheme. Inparticular, a hybrid scheme that uses both discrete cosine transformsand symmetrical discrete sine transforms for inter-predicted blocks isdescribed.

An apparatus for decoding a current block from an encoded bitstreamincludes a memory and a processor. The processor is configured toexecute instructions stored in the memory to decode, from the encodedbitstream, a prediction mode of the current block and decode the currentblock using a transform type selected from a set that includes only asymmetrical discrete sine transform (SDST) and a two-dimensionaldiscrete cosine transform (2D DCT). If the prediction mode is an interprediction mode, the transform type used is the SDST. If the predictionmode is an intra prediction mode, the transform type used is the 2D DCT.

Another apparatus for decoding a current block from an encoded bitstreamincludes a memory and a processor. The processor is configured toexecute instructions stored in the memory to decode, from the encodedbitstream, a prediction mode of the current block, on condition that theprediction mode is an inter prediction mode, decode the current blockusing a transform type selected from a set that includes only asymmetrical discrete sine transform (SDST) and a two-dimensionaldiscrete cosine transform (2D DCT), and, on condition that theprediction mode is an intra prediction mode, decode the current blockusing a first one-dimensional transform a second one-dimensionaltransform. The current block has a horizontal dimension and a verticaldimension. The first one-dimensional transform in used in one of thehorizontal dimension and the vertical dimension. The secondone-dimensional transform is used in the other of the horizontaldimension and the vertical dimension.

A method for decoding a current block from an encoded bitstream includesdecoding, from the encoded bitstream, a prediction mode of the currentblock, the current block comprising rows and columns, on condition thatthe prediction mode is an inter prediction mode, decode the currentblock using a transform type selected from a set that includes only asymmetrical discrete sine transform (SDST) and a two-dimensionaldiscrete cosine transform (2D DCT), and on condition that the predictionmode is an intra prediction mode, decode the current block using a firstone-dimensional transform a second one-dimensional transform.

Variations in these and other aspects of this disclosure will bedescribed in additional detail hereafter.

BRIEF DESCRIPTION OF THE DRAWINGS

The description herein makes reference to the accompanying drawingswherein like reference numerals refer to like parts throughout theseveral views, and wherein:

FIG. 1 is a schematic of a video encoding and decoding system;

FIG. 2 is a block diagram of an example of a computing device that canimplement a transmitting station or a receiving station;

FIG. 3 is a diagram of a typical video stream to be encoded andsubsequently decoded;

FIG. 4 is a block diagram of a video compression system in according toan aspect of the teachings herein;

FIG. 5 is a block diagram of a video decompression system according toanother aspect of the teachings herein; and

FIG. 6 is a flowchart diagram of a process for encoding a video signalusing a hybrid transform scheme.

DETAILED DESCRIPTION

A video stream may be compressed by a variety of techniques to reducebandwidth required transmit or store the video stream. A video streamcan be encoded into a bitstream, which can involve compression, and thentransmitted to a decoder that can decode or decompress the video streamto prepare it for viewing or further processing. Encoding a video streamcan involve parameters that make trade-offs between video quality andbitstream size, where increasing the perceived quality of a decodedvideo stream can increase the number of bits required to transmit orstore the bitstream.

One technique to achieve superior compression performance exploitsspatial and temporal correlation of video signals through spatial and/ormotion compensated prediction. Transform coding subsequent to predictionis another technique that improves video compression. Generally,transform coding aims to largely remove the statistical redundancybetween residual pixels after prediction. Compression performance of atransform relies on the ability to de-correlate residual pixelredundancy and compact the energy into a subset of transformcoefficients.

One common sinusoidal-based transform type used for such decorrelationis a discrete cosine transform (DCT). The DCT has long been used as anear optimal (i.e., Karhunen-Loeve transform) approach for motioncompensated prediction residuals. According to the teachings herein,however, it is noted that a symmetric discrete sine transform (SDST) canbetter capture the statistical properties of certain classes of residualpixels. The hybrid SDST and DCT coding scheme proposed herein expandsthe set of effective projection angles efficiently, which allows theresulting coefficients to represent residual signals more compactly,while maintaining minimal overhead cost and encoding complexity.

FIG. 1 is a schematic of a video encoding and decoding system 100. Atransmitting station 102 can be, for example, a computer having aninternal configuration of hardware such as that described in FIG. 2.However, other suitable implementations of transmitting station 102 arepossible. For example, the processing of transmitting station 102 can bedistributed among multiple devices.

A network 104 can connect transmitting station 102 and a receivingstation 106 for encoding and decoding of the video stream. Specifically,the video stream can be encoded in transmitting station 102 and theencoded video stream can be decoded in receiving station 106. Network104 can be, for example, the Internet. Network 104 can also be a localarea network (LAN), wide area network (WAN), virtual private network(VPN), cellular telephone network or any other means of transferring thevideo stream from transmitting station 102 to, in this example,receiving station 106.

Receiving station 106, in one example, can be a computer having aninternal configuration of hardware such as that described in FIG. 2.However, other suitable implementations of receiving station 106 arepossible. For example, the processing of receiving station 106 can bedistributed among multiple devices.

Other implementations of video encoding and decoding system 100 arepossible. For example, an implementation can omit network 104. Inanother implementation, a video stream can be encoded and then storedfor transmission at a later time to receiving station 106 or any otherdevice having memory. In one implementation, receiving station 106receives (e.g., via network 104, a computer bus, and/or somecommunication pathway) the encoded video stream and stores the videostream for later decoding. In an example implementation, a real-timetransport protocol (RTP) is used for transmission of the encoded videoover network 104. In another implementation, a transport protocol otherthan RTP may be used, e.g., an HTTP-based video streaming protocol.

When used in a video conferencing system, for example, transmittingstation 102 and/or receiving station 106 may include the ability to bothencode and decode a video stream as described below. For example,receiving station 106 could be a video conference participant whoreceives an encoded video bitstream from a video conference server(e.g., transmitting station 102) to decode and view and further encodesand transmits its own video bitstream to the video conference server fordecoding and viewing by other participants.

FIG. 2 is a block diagram of an example of a computing device 200 thatcan implement a transmitting station or a receiving station. Forexample, computing device 200 can implement one or both of transmittingstation 102 and receiving station 106 of FIG. 1. Computing device 200can be in the form of a computing system including multiple computingdevices, or in the form of a single computing device, for example, amobile phone, a tablet computer, a laptop computer, a notebook computer,a desktop computer, and the like.

A CPU 202 in computing device 200 can be a conventional centralprocessing unit. Alternatively, CPU 202 can be any other type of device,or multiple devices, capable of manipulating or processing informationnow-existing or hereafter developed. Although the disclosedimplementations can be practiced with a single processor as shown, e.g.,CPU 202, advantages in speed and efficiency can be achieved using morethan one processor.

A memory 204 in computing device 200 can be a read only memory (ROM)device or a random access memory (RAM) device in an implementation. Anyother suitable type of storage device can be used as memory 204. Memory204 can include code and data 206 that is accessed by CPU 202 using abus 212. Memory 204 can further include an operating system 208 andapplication programs 210, the application programs 210 including atleast one program that permits CPU 202 to perform the methods describedhere. For example, application programs 210 can include applications 1through N, which further include a video coding application thatperforms the methods described here. Computing device 200 can alsoinclude a secondary storage 214, which can, for example, be a memorycard used with a mobile computing device 200. Because the videocommunication sessions may contain a significant amount of information,they can be stored in whole or in part in secondary storage 214 andloaded into memory 204 as needed for processing.

Computing device 200 can also include one or more output devices, suchas a display 218. Display 218 may be, in one example, a touch sensitivedisplay that combines a display with a touch sensitive element that isoperable to sense touch inputs. Display 218 can be coupled to CPU 202via bus 212. Other output devices that permit a user to program orotherwise use computing device 200 can be provided in addition to or asan alternative to display 218. When the output device is or includes adisplay, the display can be implemented in various ways, including by aliquid crystal display (LCD), a cathode-ray tube (CRT) display or lightemitting diode (LED) display, such as an OLED display.

Computing device 200 can also include or be in communication with animage-sensing device 220, for example a camera, or any otherimage-sensing device 220 now existing or hereafter developed that cansense an image such as the image of a user operating computing device200. Image-sensing device 220 can be positioned such that it is directedtoward the user operating computing device 200. In an example, theposition and optical axis of image-sensing device 220 can be configuredsuch that the field of vision includes an area that is directly adjacentto display 218 and from which display 218 is visible.

Computing device 200 can also include or be in communication with asound-sensing device 222, for example a microphone, or any othersound-sensing device now existing or hereafter developed that can sensesounds near computing device 200. Sound-sensing device 222 can bepositioned such that it is directed toward the user operating computingdevice 200 and can be configured to receive sounds, for example, speechor other utterances, made by the user while the user operates computingdevice 200.

Although FIG. 2 depicts CPU 202 and memory 204 of computing device 200as being integrated into a single unit, other configurations can beutilized. The operations of CPU 202 can be distributed across multiplemachines (each machine having one or more of processors) that can becoupled directly or across a local area or other network. Memory 204 canbe distributed across multiple machines such as a network-based memoryor memory in multiple machines performing the operations of computingdevice 200. Although depicted here as a single bus, bus 212 of computingdevice 200 can be composed of multiple buses. Further, secondary storage214 can be directly coupled to the other components of computing device200 or can be accessed via a network and can comprise a singleintegrated unit such as a memory card or multiple units such as multiplememory cards. Computing device 200 can thus be implemented in a widevariety of configurations.

FIG. 3 is a diagram of an example of a video stream 300 to be encodedand subsequently decoded. Video stream 300 includes a video sequence302. At the next level, video sequence 302 includes a number of adjacentframes 304. While three frames are depicted as adjacent frames 304,video sequence 302 can include any number of adjacent frames 304.Adjacent frames 304 can then be further subdivided into individualframes, e.g., a single frame 306. At the next level, a single frame 306can be divided into a series of segments or planes 308. Segments (orplanes) 308 can be subsets of frames that permit parallel processing,for example. Segments 308 can also be subsets of frames that canseparate the video data into separate colors. For example, a frame 306of color video data can include a luminance plane and two chrominanceplanes. Segments 308 may be sampled at different resolutions.

Whether or not frame 306 is divided into segments 308, frame 306 may befurther subdivided into blocks 310, which can contain data correspondingto, for example, 16×16 pixels in frame 306. Blocks 310 can also bearranged to include data from one or more planes 308 of pixel data.Blocks 310 can also be of any other suitable size such as 4×4 pixels,8×8 pixels, 16×8 pixels, 8×16 pixels, 16×16 pixels or larger. Unlessotherwise noted, the terms block and macroblock are used interchangeablyherein. Frame 306 may be partitioned according to the teachings hereinas discussed in more detail below.

FIG. 4 is a block diagram of an encoder 400 in accordance with animplementation. Encoder 400 can be implemented, as described above, intransmitting station 102 such as by providing a computer softwareprogram stored in memory, for example, memory 204. The computer softwareprogram can include machine instructions that, when executed by aprocessor such as CPU 202, cause transmitting station 102 to encodevideo data in the manner described in FIG. 4. Encoder 400 can also beimplemented as specialized hardware included in, for example,transmitting station 102. Encoder 400 has the following stages toperform the various functions in a forward path (shown by the solidconnection lines) to produce an encoded or compressed bitstream 420using input video stream 300: an intra/inter prediction stage 402, atransform stage 404, a quantization stage 406, and an entropy encodingstage 408. Encoder 400 may also include a reconstruction path (shown bythe dotted connection lines) to reconstruct a frame for encoding offuture blocks. In FIG. 4, encoder 400 has the following stages toperform the various functions in the reconstruction path: adequantization stage 410, an inverse transform stage 412, areconstruction stage 414, and a loop filtering stage 416. Otherstructural variations of encoder 400 can be used to encode video stream300.

When video stream 300 is presented for encoding, each frame 306 can beprocessed in units of blocks. At intra/inter prediction stage 402, eachblock can be encoded using intra-frame prediction (also called intraprediction) or inter-frame prediction (also called inter prediction). Inany case, a prediction block can be formed. In the case ofintra-prediction, a prediction block may be formed from samples in thecurrent frame that have been previously encoded and reconstructed. Inthe case of inter-prediction, a prediction block may be formed fromsamples in one or more previously constructed reference frames.

Next, still referring to FIG. 4, the prediction block can be subtractedfrom the current block at intra/inter prediction stage 402 to produce aresidual block (also called a residual). Transform stage 404 transformsthe residual into transform coefficients in, for example, the frequencydomain using block-based transforms. According to the process describedfurther below with respect to FIG. 6, the residual block may betransformed according to the hybrid SDST and DCT scheme at transformstage 404. In one example of application of a transform, the DCTtransforms the residual block into the frequency domain where thetransform coefficient values are based on spatial frequency. The lowestfrequency (DC) coefficient at the top-left of the matrix and the highestfrequency coefficient at the bottom-right of the matrix. Note that thesize of the prediction block, and hence the residual block, may bedifferent from the size of the transform block as also discussed in moredetail below with respect to FIG. 6.

Quantization stage 406 converts the transform coefficients into discretequantum values, which are referred to as quantized transformcoefficients, using a quantizer value or a quantization level. Forexample, the transform coefficients may be divided by the quantizervalue and truncated. The quantized transform coefficients are thenentropy encoded by entropy encoding stage 408. The entropy-encodedcoefficients, together with other information used to decode the block,which may include for example the type of prediction used, transformtype, motion vectors and quantizer value, are then output to thecompressed bitstream 420. Compressed bitstream 420 can be formattedusing various techniques, such as variable length coding (VLC) orarithmetic coding. Compressed bitstream 420 can also be referred to asan encoded video stream or encoded video bitstream, and the terms willbe used interchangeably herein.

The reconstruction path in FIG. 4 (shown by the dotted connection lines)can be used to ensure that both encoder 400 and a decoder 500 (describedbelow) use the same reference frames to decode compressed bitstream 420.The reconstruction path performs functions that are similar to functionsthat take place during the decoding process that are discussed in moredetail below, including dequantizing the quantized transformcoefficients at dequantization stage 410 and inverse transforming thedequantized transform coefficients at inverse transform stage 412 toproduce a derivative residual block (also called a derivative residual).At reconstruction stage 414, the prediction block that was predicted atintra/inter prediction stage 402 can be added to the derivative residualto create a reconstructed block. Loop filtering stage 416 can be appliedto the reconstructed block to reduce distortion such as blockingartifacts.

Other variations of encoder 400 can be used to encode compressedbitstream 420. For example, a non-transform based encoder 400 canquantize the residual signal directly without transform stage 404 forcertain blocks or frames. In another implementation, an encoder 400 canhave quantization stage 406 and dequantization stage 410 combined into asingle stage.

FIG. 5 is a block diagram of a decoder 500 in accordance with anotherimplementation. Decoder 500 can be implemented in receiving station 106,for example, by providing a computer software program stored in memory204. The computer software program can include machine instructionsthat, when executed by a processor such as CPU 202, cause receivingstation 106 to decode video data in the manner described in FIG. 5.Decoder 500 can also be implemented in hardware included in, forexample, transmitting station 102 or receiving station 106.

Decoder 500, similar to the reconstruction path of encoder 400 discussedabove, includes in one example the following stages to perform variousfunctions to produce an output video stream 516 from compressedbitstream 420: an entropy decoding stage 502, a dequantization stage504, an inverse transform stage 506, an intra/inter prediction stage508, a reconstruction stage 510, a loop filtering stage 512 and adeblocking filtering stage 514. Other structural variations of decoder500 can be used to decode compressed bitstream 420.

When compressed bitstream 420 is presented for decoding, the dataelements within compressed bitstream 420 can be decoded by entropydecoding stage 502 as discussed in additional detail herein to produce aset of quantized transform coefficients. Dequantization stage 504dequantizes the quantized transform coefficients (e.g., by multiplyingthe quantized transform coefficients by the quantizer value), andinverse transform stage 506 inverse transforms the dequantized transformcoefficients using the selected transform type to produce a derivativeresidual that can be identical to that created by inverse transformstage 412 in encoder 400. Using header information decoded fromcompressed bitstream 420, decoder 500 can use intra/inter predictionstage 508 to create the same prediction block as was created in encoder400, e.g., at intra/inter prediction stage 402. At reconstruction stage510, the prediction block can be added to the derivative residual tocreate a reconstructed block. Loop filtering stage 512 can be applied tothe reconstructed block to reduce blocking artifacts. Other filteringcan be applied to the reconstructed block. In this example, deblockingfiltering stage 514 is applied to the reconstructed block to reduceblocking distortion, and the result is output as output video stream516. Output video stream 516 can also be referred to as a decoded videostream, and the terms will be used interchangeably herein.

Other variations of decoder 500 can be used to decode compressedbitstream 420. For example, decoder 500 can produce output video stream516 without deblocking filtering stage 514.

As mentioned briefly above, residuals (and particularly inter-frameresiduals) are often coded using the DCT as a theoretical approximationof the optimal transformation, Karhunen-Loeve transform (KLT), withfurther desired properties such as independent of signal statistics andfast computation flow. The DCT approximates the KLT well under theassumption that the signal follows a Gauss-Markov model, which is thecase for typical natural image pixels. Its efficacy for motioncompensated prediction residuals, however, is questionable as thecorrelation is much lower among the prediction residuals and theGauss-Markov model is not a good fit in many circumstances.

Described herein for use in encoding prediction residuals is a symmetricdiscrete sine transform (SDST). The SDST kernel may be defined as:

${X(k)} = {\sum\limits_{n = 0}^{N - 1}\; {{x(n)}*{{\sin \left( \frac{\left( {n + 1} \right)*\left( {k + 1} \right)\pi}{\left( {N + 1} \right)} \right)}.}}}$

In this equation, n is the time (pixel) domain index, k is the frequencydomain index, N is the number of pixel values within the predictionresidual, x(n) is the time (pixel) domain signal (e.g., the pixel valuefor the pixel at index n), and X(k) is the transform domainrepresentation at index k.

Like the DCT kernel, the SDST kernel holds the property that it isindependent of signal statistics. The SDST provides a rather distantprojection angle to the DCT. Accordingly, it effectively captures thestatistical properties of certain signal class. The proposed hybridSDST/DCT coding scheme switches between the two transform types formotion compensated prediction residuals. Alternating between SDST andDCT allows a video coder to represent an original residual signal in amore compact form, i.e., with better energy compaction and signaldecorrelation, than using only the DCT, for example.

One example of implementing such a hybrid scheme is described next withrespect to FIG. 6, which is a flowchart diagram of a process 600 forencoding a video signal using a hybrid transform scheme. Process 600 canbe implemented in a system such as computing device 200 to aid theencoding of a video stream. Process 600 can be implemented, for example,as a software program that is executed by a computing device such astransmitting station 102 or receiving station 106. The software programcan include machine-readable instructions that are stored in a memorysuch as memory 204 that, when executed by a processor such as CPU 202,cause the computing device to perform process 600. Process 600 can alsobe implemented using hardware in whole or in part. As explained above,some computing devices may have multiple memories and multipleprocessors, and the steps or operations of process 600 may in such casesbe distributed using different processors and memories. Use of the terms“processor” and “memory” in the singular herein encompasses computingdevices that have only one processor or one memory as well as deviceshaving multiple processors or memories that may each be used in theperformance of some but not necessarily all recited steps.

For simplicity of explanation, process 600 is depicted and described asa series of steps or operations. However, steps and operations inaccordance with this disclosure can occur in various orders and/orconcurrently. Additionally, steps or operations in accordance with thisdisclosure may occur with other steps or operations not presented anddescribed herein. Furthermore, not all illustrated steps or operationsmay be required to implement a method in accordance with the disclosedsubject matter. Process 600 is depicted for encoding of a single blockof a single frame. Process 600 may be repeated for some or all blocks ofthe single frame and/or be repeated for each frame of the input signal.Blocks may be processed in any scan order, such as raster-scan order.

Process 600 initiates by receiving an input signal at operation 602. Theinput signal is a video signal to be encoded. Receiving the signal caninclude receiving the signal from a video screen, receiving it from avideo camera, retrieving the signal from a memory device within orcoupled to a processor or remote from the processor, or any other way ofreceiving the signal for processing.

At operation 604, a residual is generated using a current block to beencoded from a frame of the signal. As discussed above, a residual blockmay be generated, calculated or otherwise produced by selecting aprediction mode and generating a prediction block using the predictionmode, where a difference between the prediction block and the currentblock is the residual block, also called the residual.

At operation 606, a query is made regarding the prediction mode used. Asshown, a query is made as to whether an inter prediction mode was usedto generate the residual. However, the query could ask instead whetheran intra prediction mode was used to generate the residual. When aninter prediction mode was used to generate the residual, process 600advances to operation 608.

At operation 608, two different transform modes (and hence, transforms)are respectively applied to the residual. In this example, the residualis transformed using the DCT, and the residual is also transformed usingthe SDST. The transforms can be applied sequentially or separately butconcurrently to the residual. This operation results in two transformblocks comprising a plurality of transform coefficients. Each transformblock is encoded and a rate-distortion value associated with thetransform block is calculated at operation 610.

Encoding a transform block at operation 608 generally includesquantizing the transform coefficients of the transform block andgenerating header information including how the block was encoded. Insome implementations, quantizing the transform coefficients may beomitted and encoding at operation 608 is completed by generating theheader information only. In order to calculate rate-distortion values atoperation 610, the encoded block is decoded using the headerinformation. Operation 608 forms part of the rate-distortion loop forencoding the current block. A rate-distortion loop determines the rate,or number of bits output from the encoding process versus thedistortion, or change in visual quality of the video stream as a resultof encoding and decoding. Distortion can be measured in a number ofdifferent ways including measuring the mean squared error (difference)between the data of the video stream before encoding and decoding andthe data of the video stream following encoding and decoding. Thus, arate-distortion value is a measure of the number of bits required torepresent the encoded block (or other subdivision of a video stream) fora given level of distortion.

In contrast, if an intra prediction mode is used as indicated by theresponse to the query of operation 606, process 600 advances tooperation 612. At operation 612, one or more transform modes arerespectively applied to the residual. Each of these transform modes isother than the SDST transform mode. In one implementation, the onlytransform modes available are the DCT and the SDST. Accordingly, theresidual generated by intra prediction is transformed using the DCT.This implementation has the desirable result that only one bit need becoded in a header to indicate the type of transform used as thetransform mode. The size of the transform used as the transform mode, asdiscussed in more detail below, could be separately signaled if sizeddifferently from the prediction block and hence different from theresidual. For example, the size could be transmitted by including acoding tree within the bitstream that describes the partitioning ofblocks into prediction blocks and transform blocks. Other ways oftransmitting the size are possible to the extent the sizes of theresidual and transform block (also referred to as sub-blocks of theresidual herein) are different.

In some implementations, other transform modes are possible. Anothertwo-dimensional (2D) transform other than the DCT may be applied to theresidual. An alternative that is desirable with residuals that have beengenerated via intra prediction is to perform a series of 1D transformson the rows and columns of the block. For example, the 2D array ofresidual pixels can be transformed by first applying one-dimensional(1D) transforms to the columns (vertically-arranged pixels) of a blockfollowed by applying 1D transforms to the rows (horizontally-arrangedpixels) or vice-versa. Generally, for example, the variance of theresidual generated using intra prediction will be lowest at theprediction edge and will be highest at the opposite side of theprediction edge. For this reason, the use of different kernels for therows and columns of the block may be desirable.

In one example of this variation, combinations of a 1D DCT and a 1DAsymmetric Discrete Sine Transform (ADST) may be selected such thattheir base functions match the pattern of the residual (i.e., thedirectionality of the prediction mode). For horizontal andhorizontal-like intra prediction modes that rely primarily on values inthe left-hand column to the current block, ADST may be used in thehorizontal direction and DCT in the vertical direction. Similarly, forvertical or vertical-like intra prediction modes that rely primarily onvalues in a top row of pixels above the current block, ADST may be usedin the vertical direction and DCT in the horizontal direction. Forgenerally diagonal modes that rely on both top row and left-hand columnvalues in a substantially similar manner, ADST may be used in both thehorizontal and vertical directions. For other intra prediction modes,there may be no particular benefit to be gained from using ADST ineither direction. Accordingly, the 1D DCT may be used in each of thehorizontal and vertical directions, or the 2D DCT may be used for theentire residual as described above with regard to operation 608.

For each of the transform modes applied at operation 612, process 600generates rate-distortion values at operation 614 in a like manner as atoperation 610. Regardless of whether the rate-distortion values aregenerated at operation 610 or at operation 614, process 600 advances tooperation 616 to determine whether more prediction modes are availablefor testing.

As mentioned briefly above, prediction modes encompass inter- andintra-prediction. Intra-prediction may include a number of modesindicating the direction of pixel propagation used to generate theprediction block. In some cases, prediction modes may be associated witha size. For example, the block may be a large block that is predictedaccording to a number of prediction modes such as a 4×4 inter-predictionmode, an 8×8 inter-prediction mode, several 4×4 intra-prediction modesand several 8×8 intra-prediction modes by appropriately dividing thelarge block.

If additional prediction modes are available, process 600 advances tostep 618, where the rate-distortion values calculated at operation 610or at operation 614 are stored for later use. Then, process 600 isperformed for the next prediction mode starting with the generation ofthe residual at operation 604.

If additional prediction modes are not available, process 600 advancesto operation 620 to select the transform mode and prediction mode thatresults in the lowest rate-distortion value for encoding the currentblock. This may be achieved at operation 620 by comparing the variousgenerated rate-distortion values. The modes associated with the lowestrate-distortion value are desirably selected to encoding the block.Generally, multiple prediction modes are used to generate a number ofresidual blocks in process 600. However, in a simple example assumingonly inter prediction mode and only one intra prediction mode areavailable, the lowest of a first rate-distortion value for encoding theresidual block using the DCT, a second rate-distortion value forencoding the residual block using the SDST, and a third rate-distortionvalue for encoding the residual block using other than the SDST would beused to decide whether to use the inter prediction mode with the DCT,the inter prediction mode with the SDST, or the intra prediction modewith the transform mode used at operation 612.

After operation 620, the block is encoded into an encoded bitstream atoperation 622. Encoding the resulting transform block may includeentropy coding the transform block by entropy coding the transformcoefficients in a scan order such as a zig-zag scan order. In somecases, encoding the resulting transform block includes quantizing thetransform coefficients of the transform block and then entropy codingthe transform block by entropy coding the quantized transformcoefficients in a scan order such as a zig-zag scan order.

Although not expressly shown in FIG. 6, it is possible that thetransform mode includes different transform sizes. For example, aminimum block size may be specified for transform mode (for example, 4×4pixels). In this case, if the residual block generated at operation 604is above the minimum block size for the transform mode, additionalprocessing may be included to partition the residual block intonon-overlapping residual sub-blocks. Then, each sub-block could beprocessed according to operations 606-614. In this way, respectiverate-distortion values are generated for each sub-block. Desirably, thesame type of transform is applied to each sub-block residual so that therate-distortion values can be combined for comparison with otherprediction and transform modes for the current block to determine thebest coding for the current block.

For example, the respective rate-distortion values generated for eachresidual sub-block of the residual block can be combined to generate acombined rate-distortion value for encoding the residual block using theDCT when the inter prediction mode is used. Similarly, the respectiverate-distortion values generated for each residual sub-block of theresidual block can be combined to generate a combined rate-distortionvalue for encoding the residual block using the SDST when interprediction is used. When the prediction mode is the intra predictionmode, the respective rate-distortion values generated for each residualsub-block of the residual block can be combined to generate a combinedrate-distortion value for encoding the residual block using other thanthe SDST, such as DCT or ADST or combinations of DCT and ADST. Thecombining of the values may be achieved via summation or some othertechnique of combining the values. In this way, encoding a residualblock using one larger transform can be compared to encoding theresidual block using smaller transforms. Selecting the transform modeand the prediction mode would thus include selecting the transform size(i.e., selecting whether to encode the current block by transforming theresidual block or transforming the residual sub-blocks).

Optionally, the selected coding may include whether or not quantizationis performed as part of the encoding process.

In the description above, the encoding and calculation ofrate-distortion values at operation 610 or operation 614 occurs for eachsub-block. That is, each sub-block transformed using the DCT isseparately encoded and decoded to calculate respective rate-distortionvalues, which are then summed for a single rate-distortion value for thecurrent block that is associated with the particular prediction mode andtransform mode—the DCT and the transform size. Similarly, each sub-blocktransformed using the SDST or other transform is separately encoded anddecoded to calculate respective rate-distortion values, which are thensummed for a single rate-distortion value for the current block that isassociated with the particular prediction mode and transform mode. Morecommonly, this calculation is done at the block level, not the sub-blocklevel as the header bits are generally associated with the block. Forexample, the transform coefficients resulting from the transformation ofthe sub-blocks using the DCT, the SDST or otherwise are combined forencoding, optionally using quantization, and are decoded to generate arate-distortion value for the whole residual block without calculatingseparate rate-distortion values for each sub-block. It is lessdesirable, but possible, that combinations of transform type may be usedfor sub-blocks of a larger residual block. In such a case, the number ofbits needed to signal the transform modes would increase.

As mentioned, the order of operations and content of process 600 mayvary. For example, process 600 is described where the prediction modeand transform mode are selected on a per-block basis using a singlerate-distortion loop. In one alternative implementation, the best interprediction mode for a block using only the DCT may be selected in onerate-distortion loop, the best inter prediction mode for the block usingonly the SDST may be selected in a separate loop, and the best intraprediction mode for the block using one transform mode for eachavailable transform mode may be selected in respective rate-distortionloops. Other combinations are possible. In such examples, the bettercombination of prediction mode and transform mode is selected for theblock. Further, process 600 uses the same transform type for allsub-blocks of a residual block when the block is partitioned atoperation 620. This is expected to be more efficient for coding assignaling of the transform type is not required for sub-blocks, anddecoding the block can rely upon one-time signaling of the transformtype (including size) regardless of how many sub-blocks exist. Moreover,the rate-distortion loop is computationally intense, and using the sametransform type for sub-blocks involves fewer computations thanalternatives. It is possible, however, that various combinations oftransforms are used in generating rate-distortion values for thesub-blocks to select transform type(s) for those sub-blocks. Thetechniques described herein also work where additional processing isused to limit the number of prediction modes.

In some cases, all of the generated rate-distortion values may not becompared at operation 620. For example, when multiple passes of the loopare performed (e.g., for different transform types, differentblock/sub-block sizes, or different prediction modes), therate-distortion values generated may be compared so that only the lowestrate-distortion value is stored in association with its prediction modeand transform mode (e.g., transform type and transform size). Then, eachnew rate-distortion value may be compared to that lowest value andstored if it is the lower than the previously-stored value or discardedif it is not lower than the previously-stored value.

The decoding process of a video bitstream encoded as described hereinmay be as described with respect to FIG. 5. In the data sent within thebitstream, one or more bits may be used within a header to indicate theprediction mode and the transform mode (e.g., a transform size andtype). When quantization is omitted from the encoding of a block,dequantization is omitted from decoding of the block.

The aspects of encoding and decoding described above illustrate someexamples of encoding and decoding techniques. However, it is to beunderstood that encoding and decoding, as those terms are used in theclaims, could mean compression, decompression, transformation, or anyother processing or change of data.

The word “example” is used herein to mean serving as an example,instance, or illustration. Any aspect or design described herein as“example” is not necessarily to be construed as preferred oradvantageous over other aspects or designs. Rather, use of the word“example” is intended to present concepts in a concrete fashion. As usedin this application, the term “or” is intended to mean an inclusive “or”rather than an exclusive “or”. That is, unless specified otherwise, orclear from context, “X includes A or B” is intended to mean any of thenatural inclusive permutations. That is, if X includes A; X includes B;or X includes both A and B, then “X includes A or B” is satisfied underany of the foregoing instances. In addition, the articles “a” and “an”as used in this application and the appended claims should generally beconstrued to mean “one or more” unless specified otherwise or clear fromcontext to be directed to a singular form. Moreover, use of the term “animplementation” or “one implementation” throughout is not intended tomean the same embodiment or implementation unless described as such.

Implementations of transmitting station 102 and/or receiving station 106(and the algorithms, methods, instructions, etc., stored thereon and/orexecuted thereby, including by encoder 400 and decoder 500) can berealized in hardware, software, or any combination thereof. The hardwarecan include, for example, computers, intellectual property (IP) cores,application-specific integrated circuits (ASICs), programmable logicarrays, optical processors, programmable logic controllers, microcode,microcontrollers, servers, microprocessors, digital signal processors orany other suitable circuit. In the claims, the term “processor” shouldbe understood as encompassing any of the foregoing hardware, eithersingly or in combination. The terms “signal” and “data” are usedinterchangeably. Further, portions of transmitting station 102 andreceiving station 106 do not necessarily have to be implemented in thesame manner.

Further, in one aspect, for example, transmitting station 102 orreceiving station 106 can be implemented using a general purposecomputer or general purpose processor with a computer program that, whenexecuted, carries out any of the respective methods, algorithms and/orinstructions described herein. In addition or alternatively, forexample, a special purpose computer/processor can be utilized which cancontain other hardware for carrying out any of the methods, algorithms,or instructions described herein.

Transmitting station 102 and receiving station 106 can, for example, beimplemented on computers in a video conferencing system. Alternatively,transmitting station 102 can be implemented on a server and receivingstation 106 can be implemented on a device separate from the server,such as a hand-held communications device. In this instance,transmitting station 102 can encode content using an encoder 400 into anencoded video signal and transmit the encoded video signal to thecommunications device. In turn, the communications device can thendecode the encoded video signal using a decoder 500. Alternatively, thecommunications device can decode content stored locally on thecommunications device, for example, content that was not transmitted bytransmitting station 102. Other suitable transmitting station 102 andreceiving station 106 implementation schemes are available. For example,receiving station 106 can be a generally stationary personal computerrather than a portable communications device and/or a device includingan encoder 400 may also include a decoder 500.

Further, all or a portion of implementations of the present inventioncan take the form of a computer program product accessible from, forexample, a tangible computer-usable or computer-readable medium. Acomputer-usable or computer-readable medium can be any device that can,for example, tangibly contain, store, communicate, or transport theprogram for use by or in connection with any processor. The medium canbe, for example, an electronic, magnetic, optical, electromagnetic, or asemiconductor device. Other suitable mediums are also available.

The above-described embodiments, implementations and aspects have beendescribed in order to allow easy understanding of the present inventionand do not limit the present invention. On the contrary, the inventionis intended to cover various modifications and equivalent arrangementsincluded within the scope of the appended claims, which scope is to beaccorded the broadest interpretation so as to encompass all suchmodifications and equivalent structure as is permitted under the law.

What is claimed is:
 1. An apparatus for decoding a current block from anencoded bitstream, the apparatus comprising: a memory; and a processorconfigured to execute instructions stored in the memory to: decode, fromthe encoded bitstream, a prediction mode of the current block; anddecode the current block using a transform type selected from a setconsisting of a symmetrical discrete sine transform (SDST) and atwo-dimensional discrete cosine transform (2D DCT), wherein on conditionthat the prediction mode is an inter prediction mode, the transform typeis the SDST, and on condition that the prediction mode is an intraprediction mode, the transform type is the 2D DCT.
 2. The apparatus ofclaim 1, wherein the instructions to determine, from the encodedbitstream, the prediction mode of the current block further compriseinstructions to: read a single bit from the encoded bitstream todetermine the prediction mode.
 3. The apparatus of claim 1, wherein thecurrent block comprises sub-blocks, and wherein the instructions todecode the current block using the SDST comprise instructions to:decode, from the encoded bitstream, a size of the SDST to be used todecode at least one sub-block of the sub-blocks.
 4. The apparatus ofclaim 1, wherein the current block comprises sub-blocks, and wherein theinstructions to decode the current block comprise instructions to:decode, from the encoded bitstream, a size of the 2D DCT to be used todecode at least one sub-block of the sub-blocks.
 5. An apparatus fordecoding a current block from an encoded bitstream, the apparatuscomprising: a memory; and a processor configured to execute instructionsstored in the memory to: decode, from the encoded bitstream, aprediction mode of the current block, the current block having ahorizontal dimension and a vertical dimension; on condition that theprediction mode is an inter prediction mode, decode the current blockusing a transform type selected from a set consisting of a symmetricaldiscrete sine transform (SDST) and a two-dimensional discrete cosinetransform (2D DCT); and on condition that the prediction mode is anintra prediction mode, decode the current block using a firstone-dimensional transform a second one-dimensional transform, whereinthe first one-dimensional transform in used in one of the horizontaldimension and the vertical dimension, and the second one-dimensionaltransform is used in the other of the horizontal dimension and thevertical dimension.
 6. The apparatus of claim 5, wherein the firstone-dimensional transform and the second one-dimensional transform areselected from a set consisting of one-dimensional DCT (1D DCT) and aone-dimensional asymmetric discrete sine transform (1D ADST), whereinthe first one-dimensional transform is used in the horizontal dimensionof the current block and the second one-dimensional transform is used inthe vertical dimension of the current block.
 7. The apparatus of claim6, wherein the first one-dimensional transform and the secondone-dimensional transform are selected based on the prediction mode. 8.The apparatus of claim 6, wherein the first one-dimensional transform isthe 1D ADST and the second one-dimensional transform is the 1D DCT. 9.The apparatus of claim 6, wherein the first one-dimensional transform isthe 1D DCT and the second one-dimensional transform is the 1D ADST. 10.The apparatus of claim 6, wherein the first one-dimensional transform isthe 1D ADST and the second one-dimensional transform is the 1D ADST. 11.The apparatus of claim 6, wherein the first one-dimensional transform isthe 1D DCT and the second one-dimensional transform is the 1D DCT. 12.The apparatus of claim 5, wherein the current block is a sub-block of aprediction block, and wherein the instructions further compriseinstructions to: read a transform block size for the current block fromthe encoded bitstream.
 13. A method for decoding a current block from anencoded bitstream, the method comprising: decoding, from the encodedbitstream, a prediction mode of the current block, the current blockcomprising rows and columns; on condition that the prediction mode is aninter prediction mode, decode the current block using a transform typeselected from a set consisting of a symmetrical discrete sine transform(SDST) and a two-dimensional discrete cosine transform (2D DCT); and oncondition that the prediction mode is an intra prediction mode, decodethe current block using a first one-dimensional transform a secondone-dimensional transform.
 14. The method of claim 13, wherein the firstone-dimensional transform and the second one-dimensional transform areselected from a set consisting of one-dimensional DCT (1D DCT) and aone-dimensional asymmetric discrete sine transform (1D ADST), whereinthe first one-dimensional transform is used in a horizontal direction ofthe current block and the second one-dimensional transform is used in avertical direction of the current block.
 15. The method of claim 14,wherein the first one-dimensional transform and the secondone-dimensional transform are selected based on the prediction mode. 16.The method of claim 14, wherein the first one-dimensional transform isthe 1D ADST and the second one-dimensional transform is the 1D DCT. 17.The method of claim 14, wherein the first one-dimensional transform isthe 1D DCT and the second one-dimensional transform is the 1D ADST. 18.The method of claim 14, wherein the first one-dimensional transform isthe 1D ADST and the second one-dimensional transform is the 1D ADST. 19.The method of claim 14, wherein the first one-dimensional transform isthe 1D DCT and the second one-dimensional transform is the 1D DCT. 20.The method of claim 13, wherein the current block is a sub-block of aprediction block, and wherein the method further comprises: reading atransform block size for the current block from the encoded bitstream.